Continuum Armed Bandit Problem of Few Variables in High Dimensions
نویسندگان
چکیده
We consider the stochastic and adversarial settings of continuum armed bandits where the arms are indexed by [0, 1]. The reward functions r : [0, 1] → R are assumed to intrinsically depend on at most k coordinate variables implying r(x1, . . . , xd) = g(xi1 , . . . , xik ) for distinct and unknown i1, . . . , ik ∈ {1, . . . , d} and some locally Hölder continuous g : [0, 1] → R with exponent α ∈ (0, 1]. Firstly, assuming (i1, . . . , ik) to be fixed across time, we propose a simple modification of the CAB1 algorithm where we construct the discrete set of sampling points to obtain a bound of O(n α+k 2α+k (logn) α 2α+k C(k, d)) on the regret, with C(k, d) depending at most polynomially in k and sub-logarithmically in d. The construction is based on creating partitions of {1, . . . , d} into k disjoint subsets and is probabilistic, hence our result holds with high probability. Secondly we extend our results to also handle the more general case where (i1, . . . , ik) can change over time and derive regret bounds for the same.
منابع مشابه
Stochastic Continuum Armed Bandit Problem of Few Linear Parameters in High Dimensions Hemant Tyagi, Sebastian Stich and Bernd Gärtner
We consider a stochastic continuum armed bandit problem where the arms are indexed by the l2 ball Bd(1+ν) of radius 1+ν in R . The reward functions r : Bd(1+ν) → R are considered to intrinsically depend on k ≪ d unknown linear parameters so that r(x) = g(Ax) where A is a full rank k × d matrix. Assuming the mean reward function to be smooth we make use of results from low-rank matrix recovery l...
متن کاملStochastic continuum armed bandit problem of few linear parameters in high dimensions
We consider a stochastic continuum armed bandit problem where the arms are indexed by the l2 ball Bd(1+ν) of radius 1+ν in R . The reward functions r : Bd(1+ν) → R are considered to intrinsically depend on k ≪ d unknown linear parameters so that r(x) = g(Ax) where A is a full rank k × d matrix. Assuming the mean reward function to be smooth we make use of results from low-rank matrix recovery l...
متن کاملImproved Rates for the Stochastic Continuum-Armed Bandit Problem
Considering one-dimensional continuum-armed bandit problems, we propose an improvement of an algorithm of Kleinberg and a new set of conditions which give rise to improved rates. In particular, we introduce a novel assumption that is complementary to the previous smoothness conditions, while at the same time smoothness of the mean payoff function is required only at the maxima. Under these new ...
متن کاملShowing Relevant Ads via Context Multi-Armed Bandits
We study context multi-armed bandit problems where the context comes from a metric space and the payoff satisfies a Lipschitz condition with respect to the metric. Abstractly, a context multi-armed bandit problem models a situation where, in a sequence of independent trials, an online algorithm chooses an action based on a given context (side information) from a set of possible actions so as to...
متن کاملMedoids in almost linear time via multi-armed bandits
Computing the medoid of a large number of points in high-dimensional space is an increasingly common operation in many data science problems. We present an algorithm Med-dit which uses O(n log n) distance evaluations to compute the medoid with high probability. Med-dit is based on a connection with the multi-armed bandit problem. We evaluate the performance of Med-dit empirically on the Netflix...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013